NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sequence-to-Segments Networks for Detecting Segments in Videos

https://doi.org/10.1109/TPAMI.2019.2940225

Wei, Zijun; Wang, Boyu; Hoai, Minh; Zhang, Jianming; Shen, Xiaohui; Lin, Zhe; Mech, Radomir; Samaras, Dimitris (March 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence)
null (Ed.)
Full Text Available
A Modulation Module for Multi-task Learning with Applications in Image Retrieval

Zhao, Xiangyun; Li, Haoxiang; Shen, Xiaohui; Liang, Xiaodan; Wu, Ying (September 2018, Proc. European Conference on Computer Vision)

Multi-task learning has been widely adopted in many computer vision tasks to improve overall computation efficiency or boost the performance of individual tasks, under the assumption that those tasks are correlated and complementary to each other. However, the relationships between the tasks are complicated in practice, especially when the number of involved tasks scales up. When two tasks are of weak relevance, they may compete or even distract each other during joint training of shared parameters, and as a consequence undermine the learning of all the tasks. This will raise destructive interference which decreases learning efficiency of shared parameters and lead to low quality loss local optimum w.r.t. shared parameters. To address the this problem, we propose a general modulation module, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition. Equipped with this module, gradient directions from different tasks can be enforced to be consistent for those shared parameters, which benefits multi-task joint training. The module is end-to-end learnable without ad-hoc design for specific tasks, and can naturally handle many tasks at the same time. We apply our approach on two retrieval tasks, face retrieval on the CelebA dataset [1] and product retrieval on the UT-Zappos50K dataset [2, 3], and demonstrate its advantage over other multi-task learning methods in both accuracy and storage efficiency.
more » « less
Full Text Available
MAttNet: Modular Attention Network for Referring Expression Comprehension

Yu, Licheng; Lin, Zhe; Shen, Xiaohui; Yang, Jimei; Lu, Xin; Bansal, Mohit; Berg, Tamara L. (June 2018, IEEE Conference on Computer Vision and Pattern Recognition)

In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: language-based attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo and code are provided.
more » « less
Full Text Available

Search for: All records